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Haw Rings assign access space 

Stepl: aH^incGrrang address to self (to some paver cf 2) 
Step2: asagithe resiit to self address 

Step3: neoct_addr = selfaddr + self_addr_space; //nixnber cfregster usedlocally 
Step4: send downn£xt_addr 



Example: 

Dma needs 16addc 
Uartneeds4 
Titter needs 256 



Enturerate ttBssag&W 
Addr=8 



self =32 



Addr=36 self =256 



self- 16 





_ : mz\ 



Addr=32 



Addr=512 



F>9- 3 
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£79 



-d 



clkl 



-CH>- 



3^ 



clk2 



If the delay between "clkl" and "clk2" 
greater then the delay from Q to d of 
second flipflop, we have a race on our 
meaning right hand flipflop will 
sample the data of Q a whole clock period 
early. 




compound A compound B 



cloc k rvns with data 

the problem is possible race. 
However, we control the logic on 
each flipflop leaving the compound, 
because it is always the same standart 
ring-interface module, we can ensure, 
that the delay will be at least enough. 
And more importantly easily checked 
after layout. 
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f. 



no 



compound 



7 



data_a 



clock opposes data 
this arrangement has the advantage 



compound B 



of auto ensuring the no race 
^condition (at least in this simple 



data_b case ) exists 



1* 



cUcb- 
clka 



Qa - 
data_a 



/ 



data_a which changes after clkb, 
which is later then clka, is sampled 
by clka. NO RACE. 



^9 

7 




compound B 



compound A 
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Qbl 



ff 



_gb 



latch 



clkb 



"T - 
-7? 



Qb 



ff 



clka 



^'3 



clock 



clock 




clka 



clkb 
data_a 




clock 



ncertainty range 



data__a leaving the bridge goes to member 4 V and 
there should be sampled by rising of clkb. clkb 
lags a lot behind clka of the bridge. As clearly seen 
from the waveforms, race is eminent. Here we 
should add latches for all the data lines (-90). 
Adding latch works however if the delay between 
clka and clkb is less then 75% of cycle time, 
otherwise the uncertainty kills the usable time. It 
sets hard limit on the number of ring members. 
Also keep in mind that latches needed on each OK 
signal between members of the ring 
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data 





clock 



ncertainty range 



Here, data_b leaves member "b" to be sampled by 
clka in the bridge. But now clkb lags a lot behind 
clka. This actually works to our advantage, If the 
lag is smaller then better part of clock cycle. This 
solution looks better, because between adjacent 
members, we can take care to delay the datas 
beyond danger zone of clock delay, the OK signals 
are covered automatically, and last leg data is also 
covered. The only signal not safe is the OK from 
bridge to "b" member. Tt will need a latch in "b*\ 



big module 



F'9 



data_in 



Iocal_clock *~ ,i0 

local_data_out 



lit 



data from 
previous member 



elk 




//(» 

data 



clock 



ring interface 



local clock lags behind 
ring_interface clock of this 
module, because we presume the 
module is big. for data_coming 
out, it is not a problem, it changes 
later then ring-i/f flipflops clock. 
However for data entering the 
module from previous member, 
the race is a possibility we must 
look into. 
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if module "a" sends a message to module "b*\ ring works 
fine. However if most of the traffic is from "c" to "b", 
this is more expensive in terms of latency. 



Another problem is "peak latency". Suppose that , "a" 
transmits mostly to "d" and "b" mostly to "c" In this case 
communication between "b" and "c" suffers degradation 
in case that peak traffic coincide. 




I** 



F<3- 13 
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Land bridge gets its name from the fact that it is 
a luxury. It spans across connected modules. 
The idea is simple. When V2 sends message to 
Dl it gets to one side of the bridge. This side 
analyzes the destination address and by some 
magic (explained later) decides to short-cut the 
path. The message re-appears at me other end of 
the bridge and gets fast to Dl . By same magic, 
message fromVl to D2 get bypassed also, 
message fromVl to Dl is treated directly. 




Enumeration is started by "Anchor*' 
which assigns address=l to itself, results 
of enumeration are labels 1 to 7. land 
bridge gets two addresses , as if it were not 
one module, there is "near'* end, that got 
enumeration label "3", and the "far" end 
marked 6. V 
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Bridge takes responcibility for strays, 
but only at the "far*' end. During 
enumeration, bridge is "polarized'* to 
have near and far end. Near is the end 
first struck by enumeration message. 



So we have exactly one enforcer for each 
ring. 



3 near 




11 far 



%£>t> 



In land bridge ring, the situation is trickier. If V2 
send message to address=5. The land bridge 
divert at 1 1/far end. it will re-appear at 3 and 
start cycling forever. 

We have to define an algorithm that will take 
care of all cases. 

Luckily there is a way. 

Land Bridge deals only with messages arriving 
at the far end and being diverted. It marks and 
monitors only those. Messages arriving at near 
end, keep their markings. Messages at fdar end 
going through, are left alone. 



F '5 



11 
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.1+ 




berA 


2p addi>^ 


berB 


mem 


64 data^Hfl 


mem 




' ok 












scan_test 






reset 






clkj?) 











2£ 



elk 



type 



ok 



idle / msgA 



X msgB 



during the first clock, OK remains active, when type 
is of msgA. It means that on the next clock, 
memberA may send new message. memberA uses 
this ok to send msgB on the next clock. msgB gets 
stuck for a clock because OK goes inactive. It goes 
inactive because the fifo in memberB is full. One 
clock later, the fifo has a free entry, so OK returns to 
l and type returns to idle next clock, return to idle 
could also be change to next message, if there was 
one. 



imsg iok 



ornsg 



ook 



h 




umsg 



uok 
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member A 

imsgiok umsg ook 



member B 

imsgiok onus 




2/7© 



The incoming messages are examined first 19 
to see if it is supervisor or work/program. 
Work/program messages have address field. 
We check if it is our address. Since we know 
that our address is aligned to our power of 2, 
The address mask (named split mask) 
causes only certain number if upper bits to 
be compared. The lower part of the address 
is passed inside as internal address. The 
upper bits are compared against self-ad dress 

register. This register gets its value during ^ 

enumeration protocol. The lower part of this Q comparator } 
register is always masked,. Hopefully ~l * 

synthesis will delete the unused bits I 

implementation. ours/through part of the address 




incoming 
address 

address 
split mask 



care part 
If-address 



\ 



that enters the member 
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Member 



RifJL* Rif_* Rif_o_* moduleJd 



RIF 



Address Space = 7 



Activation register 



Fl<3- "SO 



30O 
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Member 



Rif_o_type[7:0] 

Rif_o_addr[19:0] 

Rif_o_datal/h[31:0] 




g. 3^ 



BOO 
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Member 



RifJ_* Rif_* Rif_o_* moduleJd 



RIF 



Address Space = 7 



Activation register 



Fie?. 33> 



300 
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F/V 3^ 



the second land bridge solves most traffic problems, but 
adds 4 clocks in the overall ring length. This is not a big 
problem because no message should travel the whole 
perimiter. 




The Utopia interface is 
forced into mode that 
communicates in 
messages, not cells. We 
using the I/O and maybe 
some of the logic. 



3(* 



Application 

Specific 
Accelerators 

CRC 
Encryption 
Table Lookup 
Hashing 

\S7 



Internal Memory 5?-? 
Fast, Unified, Multi-port 



%bi^ ^st 

Network Processor 



Peripheral Expansion 

Enet, ATM, Uart, USB, Serials 



System 
Expansion 
Area 

CPU (PP) 

DMA 
Smart FIFO 
Ext. mem I/F 
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F, 5 

si 



VobU Compounds 
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32 register* of 
32 bit* per task, 

A sit at uidkaliniB. 
per tuck, which 
corn* al ta&k execution 
scheduling 

An iivikrf fibec to 
adjakicat resources 

Fast memory accessed 
by load/atofe 
irtsiruelion* 




per task and 
global registers 



Initially by 
thcPP 



Big accessed 
via a DMA interface 
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liiilli 











s - sticky bift 

- equal/zero 
ft - teas thcftta&g»ii\c 
g>t - greaict then/positive 

cany 

mb - rcfleetooii of the HAM multi-reader busy indication. 



3)22:222222211111 
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KI J ETCH SPR 



M-M RtFKTCH 



^0 




THAT Sm 




MINDKX irR 
fiaij index » 3? 
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Frame structure 
of an example 
task type 



V3 



r27. 



6. 



common 
task data 


task 

fragment 1 
data 






task fragment 
2 data . 


task 

fragment 3 
data 


data of ail 1cvel2 functions 


level 1 fl 
data 


level I 12 
data 


ievell tt 
data 







sizcoflcvelO 
frame part is 
different for 
each task 
type 

4 m/x of level! 
! frame part is 
* constant 

si 2e of level I 
frame part is 
different per 
each task 
type 
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c5j«Ld arithmc1Li>V 

^§1 



> -Flip Flop 
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typcj n 
J_interfattf_addic$i>{ 15.0 
pnCerfacc_daU»I3 1 :0J 



iypc out[7:0](«^euCRC ) 
addrfiTout[23:0] 



data_out[63:0] 
(alsogp C RC) 

okidnvc 



memory data 





1 ' ' 

» data 










1 bytefOf | byte[1) | bvt*f21 1 bvtaf31 1 byte[4] 




1 byU[8] | bytetT) | 



*^ffo 



agent com mane 
(AIDamultlroadeTf 



vobla entry 
in multireader 



irK^rcrncm flns.1 «.noop Ian source addr in sram address of destenation number of bytes 
a d ^ ,lon (op2)(opl> («p0) - - - - - to transfer 

(ap3) 
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Vobla 



agent interface 



T 



mam 
entry 



shadow 
entry 



8 




OUtpUt 




message 


O. 




encoder 


% 







T 



type out[7:0] 
addr ess oul[23 ;0] 
data^ut[63:0] 



1 reset 



, elk 



pk2drivc 



agent command 



opcode 



(AID*mesaage_js*nder) 
(optJon[6J =0) ~~ 



options[9:0J 



RA 



AID[4:0j 



RB/imm8 



J&A-l 



mes»age_sen6^r3^ u> options[5:0] j raw_d»ta[3l:0] ~T raw_address[23:0] 




typcf7:0J 



message data 



addrcssof^destenahon message type 



opcode 


option s[0 : 0] 


RA 


AID[4:D] 


RB 



(optionee] »1) 



si 



± 



message sender* r^tions[5:0l| raw data[63:Oj | raw_addressI23:0j 



10000000 



message data 



address_of_dcstcnation message type 



6Z 



doorbell 



"Thjn set mask 
di isetmask 
d n*set mask 



d rT"set mask 



Vobla 



" igent interfc ca 
stall vobla 



token 
control 



cjd[5:0] 



request 
entries 
X2 



DMA 
cemtext table 



4*° 



input 

message 
decoder 



output 
message 
-^j encoder 




pe_in [7:0] 
writ? imcrface_addrc3sl23 :0) 
intcrface_data{3 1 :0] 



ase_addrcss[7:0J 
type out[7:0] I ^ 

ad dr^s_ out[23:Q] 
Ia^!ut[63:0] 



*ok2dri 

reset 
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agent command 
(AID»dma agent] 



dm a request] 
entry 




(OPO) 



_ urgent dir autosend lone 
»s (OP2> (OP I) set ack ada?es« 



dram address 



<OMh <OP3)(OPtO> 



number of bytes 
to transfer or 
address modifycr 



laptjn franste - 



calculates rc 



multireade - 



st in _ 



Vobla 



n ultireajgata 3\:G] 
>] 

eadcr 
ent jrrtgrfefce 
stall vobla 



register 




input 


file 




buffer 


* 





_data mux 



on dmand 



CRC data 



I 



random 
number 
gcneratoi 
(TRD> 



|5 J 0,32 
machine; 



checksum 
machine 



bip16 
machine 



. c 



elk 



reset 



Ft$> 



agent command 
(AID^CRC agent) 



opcode 


>options[9:0] 


RA 


AID[4:0] 1 


l - I 




RA+Lx ' I 



DATA[63:32] 



CRC type data size generate, 



DATA[31:0] 



generateoperation overwnt 
check n***® residue 



E 



RESIDUE 
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Vobla 



igent interfa ce 



C control 
register A 



counter 



pre scale* d ,v . 

by 2 



time stamp 
register 



elk 



reset 



agent command 



opcode 


options [9:0] 


RA 


AID 


RB/immS 



tps(9:0] 



Pi 



2 
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agent interface 
v tSik switcn 



set_doorpcll n ask 
v next task vaui 



Vobla 



v_cu rre nt_task_ i t [5:0] 
v"curren^as^K valid 
0] 



v_next_task_jd[5 



v at>< 
V oo 



orbeH_req[2| 0J 
2:01 



oorbell mask 



oorbellcount 1 :0] 



■v urgent^ 



input 

message 
decoder 



task mask 
reg interfile 



asR r 



tasR request} 
register file 
and counters 



I 



TGMR 



mask control 




priority logic 


logic 







rif i_wri tc 
j nt i doo rbell cs 



rifi 3 ddr(5:03 

fr|^data[4;0] 

Jim set maskO 
'dm set maskl 



J dm set mask 2 



!dm set inask3 




rif i res et 
^ pt i clock 



agent comman* 
(AID»doorbelI 



opcode 



options[9:0] 



RA 



AID[4:0] 



R&'imm8 



sct'clcar clear clear 
glnbal request mask 
(OP2> (OPl) (OPO) 



X 



i 0,0 A0,0 t mask_bit Jndcx [2 :0] } write mask 
or 

{0,0,0.0, I ,req_bit_index[2:0] } write request 
or 

{ l,0.0,0,0,count valuc[2:0J) write counter 
or 

10, 1,0,0,0,0 AO} write TGMR 
or 

{1,1 ,0,0 AOAurgentvaluc } write urgent 



yo 
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rings input 



cycle for rings 



1%D 
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Memory 



Trajan chip 



Memory 
port 



^ — 



CS 



cs 



Data, 



F1FQ_RD 



.FTFOWR 



« — ► 



WR FIFO 



RD FIFO 



FPGA 



« ► 



Encryption 



DSP 



-► PCI 



Hosr#l (PP) 
Host #2 (DMA) 



Memory controller 



Ring 
interface 



DATA 



Wnie request 
generator 



WR FIFO used 
words counter 



Message sender 



Read request 
generator 



RD FIFO used 
words counter 



Hast #3 (Ring extender) 



*1 



Trajan 
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Trajan memory 
interface 



Trajan FIFO_RD FIFO_R J 



Trajan FJFO.WR FIFO_Wp 
input 



DPR 



Host address 
generator 



RAM / FIFO 



Target address 
generator 



Synchronizer 



Synchronizer 



Interface to 
external 
device 
(device 
specific) 



I— p. Synchronizer 



Interrupt 



FPGA 
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r ing 



enet rk mac 




setup registers: 



I rx manager) 



doorbell, task id and vise ode 
urgent viscode 
header len and address 
threshold to urgent 



doorbells 



(ring <^ p 



ntrol 




ring < 



free 



crc ovrn err last size 



rx status word 



■ 
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mac 



ring in 



fifo fC 



ram 



setu > 



jtmana jgf 



setup registers: 



doorbell, taskid and viscode 
urgent viscode 
send ahead address 
threshold to transmit 
threshold to urgent 



Tymanager idoorbell 



free entry count 
^finished frames coui 



(ring ^ 



ntrol 




ring c 



69 



APP ID= 10064333 



Page 268 of 280 



.1. 0! Q fi. M-7£ 3 3 m tl 7 43 2 O £? 



Control Plane 

Signaling Protocol* 
Protocol Management 
Exception Handling 
System Control & 



Data Plane 

Par/packet handling 
Forwarding Decision 
Classification 
OoS Handling | 
Queuing 
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*(P Comfrunds Vdf I X- 




Instruction , 


Instruction \ 


Address 


Read source . 


Write result 


retch request [ 


decode } 


calculation. . 


Registers 1 


into destination 




Data access req J 


Data execution! 


register 




13 



- Rip Rop 



- Logic 
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f resources area 



IjTVPlAPort" ' 
Rx PtFO. Rx DoorB*tt 



External 
SDRAM 




APP ID=10064333 



Page 276 of 280 



..I O !□ & iS-3.3 13 „ -G "7' O S O S 




Traccta, 



I ZOO 



snap 



|' AJUJ>, XAL'J, AAU, AALGj 
ATM 



Protocol specific 

data path 
functional blocks 




Generic data path 

processing 
functional blocks 



Generic system 

service 
functional blocks 



Fault «nd *xe*f>tton r«fx>H 



Vobla maintenance 



APP ID=1 0064333 



Page 277 of 280 



.! O n-6 H 3 3 3; .„ Q 7fl 2 QB 



c en* par C \ 





lab«A 









- Hot 



IaLcV 6 if 



WW 



• ^ mi 



FIG. 87 



APP ID=10064333 



Page 278 of 280 



ll-O O 6 M-3 3 3 . G 7"'-0 E13 2 



or 



cp\) 



_ I £ — , 

p 



no. 88 




ALU 



C/0 



iron 




1 6 Jo 



APP_ID=1 0064333 



Page 279 of 280 



.1 Q D-fi, H- 3 3 3 « O 7" OZO S 





I 


S** A.F- if 














i 





/- 17/0 



APPJD= 10064333 



Page 280 of 280 



